Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid parsing joint rule codes as distinct codes in # noqa #12809

Merged
merged 1 commit into from
Nov 2, 2024

Conversation

charliermarsh
Copy link
Member

Summary

We should enable warnings for unsupported codes, but this at least fixes the parsing for # noqa: F401F841.

Closes #12808.

@charliermarsh charliermarsh added bug Something isn't working suppression Related to supression of violations e.g. noqa labels Aug 11, 2024
@InSyncWithFoo
Copy link
Contributor

InSyncWithFoo commented Aug 11, 2024

is_alphanumeric() also allows non-ASCII-digits and letters:

# noqa: A三

Did you mean to use is_ascii_alphanumeric() instead?

@charliermarsh
Copy link
Member Author

Seems reasonable to exclude non-ASCII.

Copy link

codspeed-hq bot commented Aug 11, 2024

CodSpeed Performance Report

Merging #12809 will not alter performance

Comparing charlie/parse (c2e9746) with main (f837428)

Summary

✅ 32 untouched benchmarks

@InSyncWithFoo
Copy link
Contributor

InSyncWithFoo commented Aug 11, 2024

On that same note, is_whitespace() and trim_start() / trim_end() also handle Unicode whitespace. They should be replaced with is_ascii_whitespace() and trim_ascii_start() / trim_ascii_end(), accordingly.

"ASCII whitespace" is defined as "space, tab, LF, FF, CR". In our case, since Python comments don't allow line breaks, that boils down to just spaces and tabs.

Copy link
Contributor

github-actions bot commented Aug 11, 2024

ruff-ecosystem results

Linter (stable)

ℹ️ ecosystem check detected linter changes. (+3 -0 violations, +0 -0 fixes in 2 projects; 52 projects unchanged)

apache/airflow (+3 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --no-preview --select ALL

+ providers/src/airflow/providers/apache/hdfs/sensors/hdfs.py:45:37: RUF100 [*] Unused `noqa` directive (unknown: `Ignore`)
+ providers/src/airflow/providers/apache/hdfs/sensors/hdfs.py:50:38: RUF100 [*] Unused `noqa` directive (unknown: `Ignore`)
+ providers/src/airflow/providers/google/cloud/hooks/bigquery.py:59:42: RUF100 [*] Unused `noqa` directive (unknown: `Used`)

pandas-dev/pandas (+0 -0 violations, +0 -0 fixes)


Changes by rule (1 rules affected)

code total + violation - violation + fix - fix
RUF100 3 3 0 0 0

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+3 -0 violations, +0 -0 fixes in 1 projects; 53 projects unchanged)

apache/airflow (+3 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

+ providers/src/airflow/providers/apache/hdfs/sensors/hdfs.py:45:37: RUF100 [*] Unused `noqa` directive (unknown: `Ignore`)
+ providers/src/airflow/providers/apache/hdfs/sensors/hdfs.py:50:38: RUF100 [*] Unused `noqa` directive (unknown: `Ignore`)
+ providers/src/airflow/providers/google/cloud/hooks/bigquery.py:59:42: RUF100 [*] Unused `noqa` directive (unknown: `Used`)

Changes by rule (1 rules affected)

code total + violation - violation + fix - fix
RUF100 3 3 0 0 0

@charliermarsh
Copy link
Member Author

But Python comments can contain non-ASCII whitespace? We already have an is_python_whitespace that respects the spec for tokens: https://docs.python.org/3/reference/lexical_analysis.html#whitespace-between-tokens

@InSyncWithFoo
Copy link
Contributor

InSyncWithFoo commented Aug 11, 2024

But Python comments can contain non-ASCII whitespace?

Fair enough, though I still don't see why someone would want to use non-ASCII whitespace in the noqa part of a comment. A user might want to write an explanation in their native language in the same comment, sure, but noqa and the rule codes are never localized.

@InSyncWithFoo
Copy link
Contributor

InSyncWithFoo commented Aug 12, 2024

One more thing: ParsedFileExemption::try_extract() currently treats A001 , B002 (whitespace before comma) and A001,,B002 (empty slot) as if B002 doesn't exist. Is this difference in behaviour intended?


#[test]
fn noqa_empty_comma() {
let source = "# noqa: F401,,F841";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should show a warning in this case but still parse the remaining codes (if that's possible currently). This would be similar to how we recover from a missing element in list parsing: https://play.ruff.rs/308ab356-9355-43f3-99b7-12c7c2de7334.

@charliermarsh charliermarsh force-pushed the charlie/parse branch 2 times, most recently from 464bb82 to 5894aa0 Compare November 2, 2024 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working suppression Related to supression of violations e.g. noqa
Projects
None yet
Development

Successfully merging this pull request may close these issues.

noqa comments parsing logic
3 participants